4,232 research outputs found

    iCrawl: Improving the Freshness of Web Collections by Integrating Social Web and Focused Web Crawling

    Full text link
    Researchers in the Digital Humanities and journalists need to monitor, collect and analyze fresh online content regarding current events such as the Ebola outbreak or the Ukraine crisis on demand. However, existing focused crawling approaches only consider topical aspects while ignoring temporal aspects and therefore cannot achieve thematically coherent and fresh Web collections. Especially Social Media provide a rich source of fresh content, which is not used by state-of-the-art focused crawlers. In this paper we address the issues of enabling the collection of fresh and relevant Web and Social Web content for a topic of interest through seamless integration of Web and Social Media in a novel integrated focused crawler. The crawler collects Web and Social Media content in a single system and exploits the stream of fresh Social Media content for guiding the crawler.Comment: Published in the Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries 201

    Combining design and performance in a data visualization management system

    Get PDF
    Interactive data visualizations have emerged as a prominent way to bring data exploration and analysis capabilities to both technical and non-technical users. Despite their ubiquity and importance across applications, multiple design- and performance-related challenges lurk beneath the visualization creation process. To meet these challenges, application designers either use visualization systems (e.g., Endeca, Tableau, and Splunk) that are tailored to domain-specific analyses, or manually design, implement, and optimize their own solutions. Unfortunately, both approaches typically slow down the creation process. In this paper, we describe the status of our progress towards an end-to-end relational approach in our data visualization management system (DVMS). We introduce DeVIL, a SQL-like language to express static as well as interactive visualizations as database views that combine user inpu

    Vamsa: Automated Provenance Tracking in Data Science Scripts

    Full text link
    There has recently been a lot of ongoing research in the areas of fairness, bias and explainability of machine learning (ML) models due to the self-evident or regulatory requirements of various ML applications. We make the following observation: All of these approaches require a robust understanding of the relationship between ML models and the data used to train them. In this work, we introduce the ML provenance tracking problem: the fundamental idea is to automatically track which columns in a dataset have been used to derive the features/labels of an ML model. We discuss the challenges in capturing such information in the context of Python, the most common language used by data scientists. We then present Vamsa, a modular system that extracts provenance from Python scripts without requiring any changes to the users' code. Using 26K real data science scripts, we verify the effectiveness of Vamsa in terms of coverage, and performance. We also evaluate Vamsa's accuracy on a smaller subset of manually labeled data. Our analysis shows that Vamsa's precision and recall range from 90.4% to 99.1% and its latency is in the order of milliseconds for average size scripts. Drawing from our experience in deploying ML models in production, we also present an example in which Vamsa helps automatically identify models that are affected by data corruption issues

    Constraints on the χ_(c1) versus χ_(c2) polarizations in proton-proton collisions at √s = 8 TeV

    Get PDF
    The polarizations of promptly produced χ_(c1) and χ_(c2) mesons are studied using data collected by the CMS experiment at the LHC, in proton-proton collisions at √s=8  TeV. The χ_c states are reconstructed via their radiative decays χ_c → J/ψγ, with the photons being measured through conversions to e⁺e⁻, which allows the two states to be well resolved. The polarizations are measured in the helicity frame, through the analysis of the χ_(c2) to χ_(c1) yield ratio as a function of the polar or azimuthal angle of the positive muon emitted in the J/ψ → μ⁺μ⁻ decay, in three bins of J/ψ transverse momentum. While no differences are seen between the two states in terms of azimuthal decay angle distributions, they are observed to have significantly different polar anisotropies. The measurement favors a scenario where at least one of the two states is strongly polarized along the helicity quantization axis, in agreement with nonrelativistic quantum chromodynamics predictions. This is the first measurement of significantly polarized quarkonia produced at high transverse momentum
    corecore